TruthfulQA: Measuring How Models Mimic Human Falsehoods

The benchmark comprises 817 questions that span 38 categories, including health, law, finance and politics.

We crafted questions that some humans would answer falsely due to a false belief or misconception.